I'm about to graduate from The Fletcher School, and I want to make sure that IF I return to Boston post-covid, I find the right place to live. Enter this short, somewhat silly project. The purpose of this analysis is to determine where I should, based on factors I have identified as particularly important to my own individual happiness, including:
In case you would like to investigate the data I used for myself, you'll find it in the data folder, which contains the following files:
Variables
data/raster/NLCD/NLCD_2016_Impervious_Boston.tif - impervious surface raster for Boston and the surrounding area (GeoTIFF)data/raster/NLCD/NLCD_2016_Land_Cover_Boston.tif - land cover raster for Boston and the surrounding area (GeoTIFF)data/raster/NLCD/NLCD_Land_Cover_Legend.jpg - the key for the land cover raster (JPG)data/raster/NLCD/NLCD_2016_Tree_Canopy_Boston.tif - tree canopy raster for Boston and the surrounding area (GeoTIFF)data/vector/MassGIS/FARMERSMARKETS_PT/FARMERSMARKETS_PT.shp - farmers markets of Boston (ESRI shapefile)data/vector/MassGIS/LIBRARIES_PT/LIBRARIES_PT.shp - Boston libraries (ESRI shapefile)OpenStreetMap restaurant data for the Boston region - Boston region restaurants (OpenStreetMap import)data/tabular/ACSST5Y2019.S0101/ACSST5Y2019.S0101_data_with_overlays.csv - resident age data by zip code for Boston (.csv)Maps
data/vector/Census/tl_2010_25_zcta510/tl_2010_25_zcta510.shp - zip code boundaries shapefile for Boston (ESRI shapefile)data/vector/MassGIS/OUTLINE25K_POLY/OUTLINE25K_POLY.shp - Massachusetts outlinedata/vector/MassDOT/MPO_Boundaries/MPO_Boundaries.shp - Metropolital Planning Organization (MPO) boundariesHere is a rough outline of the analysis I will perform:
import rasterio
from rasterio.plot import show
from rasterio import features
import osmnx as ox
import numpy as np
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import folium
import contextily as cx
from scipy import ndimage
from rasterstats import zonal_stats
Before I get started with my variables, I want to check to see if all three of the rasters I'll be using have the same metadata, and load the metadata from one of them.
# Open all three rasters, check that their metadata is equivalent
with rasterio.open('data/raster/NLCD/NLCD_2016_Impervious_Boston.tif') as impervious_check:
impervious_meta = impervious_check.meta
with rasterio.open('data/raster/NLCD/NLCD_2016_Land_Cover_Boston.tif') as landcover_check:
landcover_meta = landcover_check.meta
with rasterio.open('data/raster/NLCD/NLCD_2016_Tree_Canopy_Boston.tif') as tree_check:
tree_meta = tree_check.meta
impervious_meta == landcover_meta == tree_meta
True
# Read in the impervious surface area raster and extract the relevant spatial information.
with rasterio.open('data/raster/NLCD/NLCD_2016_Impervious_Boston.tif') as raster:
bounds = raster.bounds
crs = raster.crs
nodata = raster.nodata
shape = raster.shape
transform = raster.transform
res = raster.res
impervious = raster.read(1)
Great! Let's get started.
Let's see those ZIP code tabulation areas (ZCTAs). I'm also going to convert these to the proper crs, which in this case is EPSG:6491, Massachusetts Mainland, and select only the ZCTAs which are in the Boston area (MPO = Boston Region). To get us started, we'll bring in the outline of Massachusetts - it'll help us to remove any overlap the ZCTAs have with the coastline.
# Bring in the Massachusetts mainland outline and convert it to EPSG:6491.
outline = gpd.read_file('data/vector/MassGIS/OUTLINE25K_POLY/OUTLINE25K_POLY.shp')
outline.to_crs('epsg:6491', inplace=True)
Now we'll bring in the ZCTAs and clip them according to the official state outline.
# Bring in the ZCTA file and again convert its crs.
zips = gpd.read_file('data/vector/Census/tl_2010_25_zcta510/tl_2010_25_zcta510.shp')
zips.to_crs('epsg:6491', inplace=True)
# Let's clip!
zips_intersect = gpd.clip(zips, outline)
zips_intersect.plot(facecolor='none', edgecolor='k')
<AxesSubplot:>
Great. This will be more precise. Before we can get into just those ZCTAs that make up the Boston area, we have to extract just the Boston region from the MA Metropolitan Planning Organizations (MPOs).
# Bring in the MPO boundaries shapefile, select for Boston, and convert it to the proper crs.
mpo = gpd.read_file('data/vector/MassDOT/MPO_Boundaries/MPO_Boundaries.shp')
boston_mpo = mpo[mpo['MPO'] == 'Boston Region'].copy()
boston_mpo.to_crs('epsg:6491', inplace=True)
Beautiful. Now I know that some ZCTA boundaries could cross town (and therefore MPO) borders, so I'm going to find the centroid of each ZCTA and keep only those ZCTAs whose centroids are within the Boston Region MPO.
# Clip those zips!
zips_clipped = zips[zips.centroid.within(boston_mpo.geometry[10])]
# Let's see 'em!
zips_clipped.plot(facecolor='None', edgecolor='Black')
<AxesSubplot:>
Let's also create a mask from the Boston Region MPO while we're at it - we'll use it later.
# Rasterize the Boston Region MPO
mask = features.rasterize(boston_mpo['geometry'], out_shape=shape,
fill=np.NaN, transform=transform, default_value=1)
Now we have a map to work with. On to the variables!
I loaded the landcover raster above, so let's actually check that metadata and then see what the matrix looks like.
impervious_meta
{'driver': 'GTiff',
'dtype': 'uint8',
'nodata': None,
'width': 3322,
'height': 3024,
'count': 1,
'crs': CRS.from_epsg(6491),
'transform': Affine(30.0, 0.0, 181770.0,
0.0, -30.0, 948390.0)}
impervious
array([[ 0, 0, 0, ..., 127, 127, 127],
[ 0, 0, 0, ..., 127, 127, 127],
[ 0, 0, 0, ..., 127, 127, 127],
...,
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
Looks pretty solid. Now I want to get rid of all that pesky water on the coast.
# Multiply raster by the mask!
impervious_zip = impervious * mask
impervious_zip
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])
show(impervious_zip, title="Impervious Surface Raster",
transform=transform, cmap='RdYlBu')
plt.show()
The values in this raster tell us the percentage of the pixel area that's covered by impervious surfaces. I'm going to reclassify the Impervious Surface Raster to my own tastes - I want to see things that aren't asphalt and experience less flooding. I'd also like to vote for green infrastructure with my feet.
| Min Imperviousness (Inclusive) | Max Imperviousness (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| ... | 10% | Very high suitability | 5 |
| 10% | 20% | High suitability | 4 |
| 20% | 60% | Medium suitability | 3 |
| 60% | 100% | Low suitability | 2 |
| 100% | ... | Very low suitability | 1 |
# Create a new impervious surface array
impervious_reclass = np.full(shape, np.NaN)
# Reclassify that surface!
impervious_reclass[impervious_zip < 10] = 5
impervious_reclass[(impervious_zip >= 10) & (impervious_zip < 20)] = 4
impervious_reclass[(impervious_zip >= 20) & (impervious_zip < 60)] = 3
impervious_reclass[(impervious_zip >= 60) & (impervious_zip < 100)] = 2
impervious_reclass[impervious_zip >= 100] = 1
# Let's see what it looks like now!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(impervious_reclass, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Impervious Surface Raster", fontsize=20)
plt.show()
Alright, let's load in that land cover raster and get started!
with rasterio.open('data/raster/NLCD/NLCD_2016_Land_Cover_Boston.tif') as land:
landcover = land.read(1)
landcover
array([[41, 41, 41, ..., 0, 0, 0],
[41, 41, 41, ..., 0, 0, 0],
[41, 41, 41, ..., 0, 0, 0],
...,
[43, 43, 43, ..., 11, 11, 11],
[43, 43, 43, ..., 11, 11, 11],
[41, 43, 43, ..., 11, 11, 11]], dtype=uint8)
I can see those classifications there already. I'm going to get rid of some of them and just clip the raster to the area I'm interested in. And let's see what that looks like, too.
# Mask the landcover! I'm not going to display the raster after this because all of the corners will be NaN.
landcover_zip = landcover * mask
# Visualize the landcover!
show(landcover_zip, title='Landcover Raster', transform=transform, cmap='tab10')
plt.show()
Gorgeous. Now let's reclassify that based on what kind of area I want to live in. The table below shows the land cover types I'll be working with:
| Cell Value | Land Cover |
|---|---|
| 11 | Open Water |
| 12 | Perennial Ice/Snow |
| 21 | Developed, Open Space |
| 22 | Developed, Low Intensity |
| 23 | Developed, Medium Intensity |
| 24 | Developed, High Intensity |
| 31 | Barren Land (Rock/Sand/Clay) |
| 41 | Deciduous Forest |
| 42 | Evergreen Forest |
| 43 | Mixed Forest |
| 52 | Shrub/Scrub |
| 71 | Grassland/Herbaceous |
| 81 | Pasture/Hay |
| 82 | Cultivated Crops |
| 90 | Woody Wetlands |
| 95 | Emergent Herbaceous Wetlands |
And I'm going to assign suitability scores from one to five to these land cover types based on the values below (just my own personal preferences, really):
| Land Cover Types | Land Cover Codes | Suitability Level | Suitability Score |
|---|---|---|---|
| Developed (all) | 21, 22, 23, 24 | Very high suitability | 5 |
| Forest | 41, 42, 43 | High suitability | 4 |
| Grassland, shrub | 52, 71 | Medium suitability | 3 |
| Barren land, cropland | 31, 81, 82 | Low suitability | 2 |
| Wetlands, water, ice/snow | 11, 12, 90, 95 | Very low suitability | 1 |
# Create a new array to work from.
landcover_reclass = np.full(shape, np.NaN)
# Now to reclassify all those land cover types
landcover_reclass[(landcover_zip == 21) | (landcover_zip == 22) \
| (landcover_zip == 23) | (landcover_zip == 24)] = 5
landcover_reclass[(landcover_zip == 41) | (landcover_zip == 42) \
| (landcover_zip == 43)] = 4
landcover_reclass[(landcover_zip == 52) | (landcover_zip == 71)] = 3
landcover_reclass[(landcover_zip == 31) | (landcover_zip == 81) \
| (landcover_zip == 82)] = 2
landcover_reclass[(landcover_zip == 11) | (landcover_zip == 12) \
| (landcover_zip == 90) | (landcover_zip == 95)] = 1
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(landcover_reclass, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Landcover, Boston Region", fontsize=20)
plt.show()
Just about the opposite of the impervious surface raster - but that's okay! Let's keep going. This is great.
Time for trees! Let's load this raster in and see what the trees look like.
with rasterio.open('data/raster/NLCD/NLCD_2016_Tree_Canopy_Boston.tif') as tree:
trees = tree.read(1)
trees
array([[ 86, 88, 84, ..., 255, 255, 255],
[ 88, 89, 89, ..., 255, 255, 255],
[ 85, 89, 88, ..., 255, 255, 255],
...,
[ 85, 84, 89, ..., 0, 0, 0],
[ 89, 87, 87, ..., 0, 0, 0],
[ 89, 87, 86, ..., 0, 0, 0]], dtype=uint8)
Looks like 255 is the null value. I'm going to get rid of those and the non-Boston trees, and just clip the raster to the area I'm interested in. And let's see what that looks like, too.
# Mask the landcover! I'm not going to display the raster after this because all of the corners will be NaN.
trees_zip = trees * mask
# Visualize the landcover!
show(trees_zip, title='Tree Cover Raster', transform=transform, cmap='RdYlBu')
plt.show()
Lots of good looking trees in there. Now let's reclassify that based on the percentage of trees I want to live under (hint: more is better):
| Min Tree Cover (Inclusive) | Max Tree Cover (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| 90% | ... | Very high suitability | 5 |
| 75% | 90% | High suitability | 4 |
| 50% | 75% | Medium suitability | 3 |
| 25% | 50% | Low suitability | 2 |
| ... | 25% | Very low suitability | 1 |
# Create a new impervious surface array
trees_reclass = np.full(shape, np.NaN)
# Reclassify that surface!
trees_reclass[trees_zip >= 90] = 5
trees_reclass[(trees_zip >= 75) & (trees_zip < 90)] = 4
trees_reclass[(trees_zip >= 50) & (trees_zip < 75)] = 3
trees_reclass[(trees_zip >= 25) & (trees_zip < 50)] = 2
trees_reclass[trees_zip < 25] = 1
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(trees_reclass, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Tree Cover, Boston Region", fontsize=20)
plt.show()
It's starting to look like I don't want to live in Somerville.... Ah well. Onward!
Time to rasterize some farmers markets! This will be interesting - I want to create a distance raster for these.
# Read in the farmers markets shapefile from MassGIS and convert the crs.
farmmarkets = gpd.read_file('data/vector/MassGIS/FARMERSMARKETS_PT/FARMERSMARKETS_PT.shp')
farmmarkets.to_crs('epsg:6491', inplace=True)
# Now I want to clip these farmers markets to only the extent of the Boston Region.
farmmarkets_bos = gpd.clip(farmmarkets, boston_mpo)
What do these look like, where are they on a map?
ax = farmmarkets_bos.plot(color='red',
figsize=(8,8) )
ax.set_title('Boston Region Farmers Markets')
cx.add_basemap(ax, zoom=10, crs='epsg:6491')
Nice! Lots of concentration in towards Boston and Cambridge. Now to turn them into rasters!
farmmarkets_raster = features.rasterize(farmmarkets_bos['geometry'],
out_shape=shape, fill=1, transform=transform, default_value=0)
Now to create the distance raster - we'll plot this one too, because it'll actually look like something.
farmmarkets_raster
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], dtype=uint8)
farmmarket_distance = ndimage.distance_transform_edt(farmmarkets_raster) * res[0]
show(farmmarket_distance, title="Distance to Farmers Markets", transform=transform)
plt.show()
So many farmers markets! I had no idea. Let's reclassify this based on how far I really want to walk for veggies.
| Min Distance (Inclusive) | Max Distance (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| ... | .5 miles (805 m) | Very high suitability | 5 |
| .5 miles (805 m) | 1 mile (1609 m) | High suitability | 4 |
| 1 mile (1609 m) | 2 miles (3218 m) | Medium suitability | 3 |
| 2 miles (3218 m) | 5 miles (8046 m) | Low suitability | 2 |
| 5 miles (8046 m) | ... | Very low suitability | 1 |
# Create a new raster, reclassify based on the rankings above.
farmers_reclass = np.full(shape, np.NaN)
farmers_reclass[farmmarket_distance < 805] = 5
farmers_reclass[(farmmarket_distance >= 805) & (farmmarket_distance < 1609)] = 4
farmers_reclass[(farmmarket_distance >= 1609) & (farmmarket_distance < 3218)] = 3
farmers_reclass[(farmmarket_distance >= 3218) & (farmmarket_distance < 8046)] = 2
farmers_reclass[farmmarket_distance >= 8046] = 1
# As ever, I want to take out the parts that aren't in the Boston region.
# Even though we cut out the markets above, the raster still tried to think about them.
farmers_reclass_bos = farmers_reclass * mask
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(farmers_reclass_bos, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Distance to Farmers Markers, Boston Region", fontsize=20)
plt.show()
Definintely the weirdest looking map so far. To the library!
Time to rasterize all these wonderful libraries! We want a distance raster here too.
# Read in the farmers markets shapefile from MassGIS and convert the crs.
libraries = gpd.read_file('data/vector/MassGIS/LIBRARIES_PT/LIBRARIES_PT.shp')
libraries.to_crs('epsg:6491', inplace=True)
# Now I want to clip these libraries to only the extent of the Boston Region.
libraries_bos = gpd.clip(libraries, boston_mpo)
# Lets rasterize those libraries!
libraries_raster = features.rasterize(libraries_bos['geometry'],
out_shape=shape, fill=1,
transform=transform, default_value=0)
Where are these libraries? Let's see them on a map.
ax = libraries_bos.plot(color='green',
figsize=(8,8) )
ax.set_title('Boston Region Libraries')
cx.add_basemap(ax, zoom=10, crs='epsg:6491')
Gorgeous. Now to create the distance raster - we'll plot this one too, because it'll actually look like something.
libraries_raster
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], dtype=uint8)
libraries_distance = ndimage.distance_transform_edt(libraries_raster) * res[0]
show(libraries_distance, title="Distance to Libraries", transform=transform)
plt.show()
Even more libraries! I had no idea. Let's reclassify this based on how far I really want to walk for books. I'm using the same ratings as the ones for farmers markets.
| Min Distance (Inclusive) | Max Distance (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| ... | .5 miles (805 m) | Very high suitability | 5 |
| .5 miles (805 m) | 1 mile (1609 m) | High suitability | 4 |
| 1 mile (1609 m) | 2 miles (3218 m) | Medium suitability | 3 |
| 2 miles (3218 m) | 5 miles (8046 m) | Low suitability | 2 |
| 5 miles (8046 m) | ... | Very low suitability | 1 |
# Create a new raster, reclassify based on the rankings above.
libraries_reclass = np.full(shape, np.NaN)
libraries_reclass[libraries_distance < 805] = 5
libraries_reclass[(libraries_distance >= 805) & (libraries_distance < 1609)] = 4
libraries_reclass[(libraries_distance >= 1609) & (libraries_distance < 3218)] = 3
libraries_reclass[(libraries_distance >= 3218) & (libraries_distance < 8046)] = 2
libraries_reclass[libraries_distance >= 8046] = 1
# As ever, I want to take out the parts that aren't in the Boston region.
libraries_reclass_bos = libraries_reclass * mask
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(libraries_reclass_bos, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Distance to Libraries, Boston Region", fontsize=20)
plt.show()
Another very funny map. Those polka dots definitely want me to live in Somerville, even if tree cover doesn't!
Time to get OpenStreetMap in here! Let's find those restaurants. I'm going to bring in all of the restaurants in the Boston Region MPO.
# Create a version of the Boston Region MPO in WGS 84 - osmnx likes that better.
boston_wgs = boston_mpo.to_crs('epsg:4326')
# Bring 'em in!
polygon = boston_wgs.geometry.iat[0]
tags = {'amenity': 'restaurant'}
restaurants = ox.geometries_from_polygon(polygon, tags)
print('Retrieved', len(restaurants), 'objects.')
Retrieved 2057 objects.
That's so many restaurants. Let's see some of the relevant columns to see if this looks right.
cols = ['name', 'cuisine', 'addr:housenumber', 'addr:street', 'addr:city',
'phone', 'website']
restaurants[cols].head(-5)
| name | cuisine | addr:housenumber | addr:street | addr:city | phone | website | ||
|---|---|---|---|---|---|---|---|---|
| element_type | osmid | |||||||
| node | 257584601 | Millis House of Pizza | pizza | 915 | Main Street | NaN | +1-508-376-8500 | https://www.millishouseofpizza.com/ |
| 367779570 | Vineripe grill | american | 1281 | West Roxbury Parkway | Brookline | 617-278-4000 | https://www.vineripegrill.com/ | |
| 373016823 | Solea | NaN | NaN | NaN | NaN | NaN | NaN | |
| 530711262 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
| 530711266 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| way | 681923618 | 5 South Main | NaN | 5 | South Main Street | NaN | +1-781-383-3555 | https://5southmain.com/ |
| 709826141 | New York Fried Chicken | fried_chicken | 1179 | North Main Street | NaN | NaN | NaN | |
| 709826143 | N & H Saigon Sub | vietnamese | 1185 | North Main Street | NaN | NaN | NaN | |
| 845207542 | Top Notch Caribbean Cuisine | jamaican | 352 | North Main Street | NaN | +1-781-885-2797 | https://www.topnotchcaribbean.com/ | |
| 845207544 | Toast of the Town | breakfast;lunch | 348 | North Main Street | NaN | +1-781-961-3883 | https://www.toastofthetownma.com/ |
2052 rows × 7 columns
Hey, at least some of them have addresses. I'd like to fix the crs on this layer and then see what they look like.
restaurants.to_crs('epsg:6491', inplace=True)
ax = restaurants.plot(color='blue',
figsize=(8,8) )
ax.set_title('Boston Region Restaurants')
cx.add_basemap(ax, zoom=10, crs='epsg:6491')
Excellent. Time to rasterize those restaurants!
rest_raster = features.rasterize(restaurants['geometry'],
out_shape=shape, fill=1, transform=transform, default_value=0)
Now to create the distance raster - we'll plot this one too, because it'll actually look like something.
rest_raster
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]], dtype=uint8)
rest_distance = ndimage.distance_transform_edt(rest_raster) * res[0]
show(rest_distance, title="Distance to Restaurants", transform=transform)
plt.show()
Heck yeah! Food everywhere. Let's reclassify this based on how far I really want to walk. I'm using the same ratings as the ones for farmers markets and books, because ultimately if I can't get there in 30 minutes, I can't get there at all.
| Min Distance (Inclusive) | Max Distance (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| ... | .5 miles (805 m) | Very high suitability | 5 |
| .5 miles (805 m) | 1 mile (1609 m) | High suitability | 4 |
| 1 mile (1609 m) | 2 miles (3218 m) | Medium suitability | 3 |
| 2 miles (3218 m) | 5 miles (8046 m) | Low suitability | 2 |
| 5 miles (8046 m) | ... | Very low suitability | 1 |
# Create a new raster, reclassify based on the rankings above.
rest_reclass = np.full(shape, np.NaN)
rest_reclass[rest_distance < 805] = 5
rest_reclass[(rest_distance >= 805) & (rest_distance < 1609)] = 4
rest_reclass[(rest_distance >= 1609) & (rest_distance < 3218)] = 3
rest_reclass[(rest_distance >= 3218) & (rest_distance < 8046)] = 2
rest_reclass[rest_distance >= 8046] = 1
# As ever, I want to take out the parts that aren't in the Boston region.
rest_reclass_bos = rest_reclass * mask
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(rest_reclass_bos, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Distance to Restaurants, Boston Region", fontsize=20)
plt.show()
That's a lot of blue in the very city-ish areas in the middle, and my stomach's a pretty big factor here. I might end up in more developed ZCTAs than I expected.
This one might be a little strange, using tabular data to create a raster. But we'll see what we get - I probably won't weight this very heavily anyway.
# Let's read in that age data!
age = pd.read_csv('data/tabular/ACSST5Y2019.S0101/ACSST5Y2019.S0101_data_with_overlays.csv',
skiprows=[1])
Now we have the age data! But there are way too many variables here, we just want the important ones. Let's delete all the extra stuff.
cols = ['GEO_ID', 'NAME', 'S0101_C02_008E']
age_reduce = age[cols].copy()
I know the S0101_C02_008E column is the percent of residents aged 30-34, but I also know it was read in as an object - so let's convert that.
age_reduce['age'] = age_reduce['S0101_C02_008E'].apply(pd.to_numeric, errors='coerce')
Now let's make that NAME column into something that'll actually work with the ZCTA geodataframe.
age_reduce['ZCTA'] = age_reduce['NAME'].str[5:].apply(pd.to_numeric)
age_reduce
| GEO_ID | NAME | S0101_C02_008E | age | ZCTA | |
|---|---|---|---|---|---|
| 0 | 8600000US01001 | ZCTA5 01001 | 7.1 | 7.1 | 1001 |
| 1 | 8600000US01002 | ZCTA5 01002 | 4.5 | 4.5 | 1002 |
| 2 | 8600000US01003 | ZCTA5 01003 | 0.0 | 0.0 | 1003 |
| 3 | 8600000US01005 | ZCTA5 01005 | 6.2 | 6.2 | 1005 |
| 4 | 8600000US01007 | ZCTA5 01007 | 5.0 | 5.0 | 1007 |
| ... | ... | ... | ... | ... | ... |
| 532 | 8600000US02777 | ZCTA5 02777 | 6.1 | 6.1 | 2777 |
| 533 | 8600000US02779 | ZCTA5 02779 | 3.1 | 3.1 | 2779 |
| 534 | 8600000US02780 | ZCTA5 02780 | 7.9 | 7.9 | 2780 |
| 535 | 8600000US02790 | ZCTA5 02790 | 3.9 | 3.9 | 2790 |
| 536 | 8600000US02791 | ZCTA5 02791 | 5.7 | 5.7 | 2791 |
537 rows × 5 columns
Excellent. And now I'll make a similar numeric column in the ZCTA file - just to be absolutely certain they'll talk to each other.
zip_codes = zips_clipped.copy()
zip_codes['ZCTA'] = zip_codes['ZCTA5CE10'].apply(pd.to_numeric)
zip_codes
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | ZCTA | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | POLYGON ((243830.175 913871.662, 243877.753 91... | 1905 |
| 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((245713.029 914796.777, 245700.484 91... | 1904 |
| 21 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | POLYGON ((250741.171 926889.812, 250787.714 92... | 1915 |
| 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | N | MULTIPOLYGON (((222090.890 897305.465, 222058.... | 2462 |
| 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((208151.539 893024.178, 208131.535 89... | 1760 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 523 | 25 | 02071 | 2502071 | B5 | G6350 | S | 2513156 | 33452 | +42.1034247 | -071.2735885 | N | MULTIPOLYGON (((217320.943 872620.799, 217310.... | 2071 |
| 524 | 25 | 02343 | 2502343 | B5 | G6350 | S | 18146751 | 280250 | +42.1489129 | -071.0037371 | N | POLYGON ((243087.621 878272.261, 243078.760 87... | 2343 |
| 526 | 25 | 02129 | 2502129 | B5 | G6350 | S | 3492181 | 1266666 | +42.3796570 | -071.0614875 | N | POLYGON ((235383.264 902518.035, 235381.204 90... | 2129 |
| 527 | 25 | 02128 | 2502128 | B5 | G6350 | S | 12561059 | 2952872 | +42.3611289 | -071.0069754 | N | POLYGON ((237939.099 904138.351, 237949.737 90... | 2128 |
| 529 | 25 | 02151 | 2502151 | B5 | G6350 | S | 15190546 | 3291332 | +42.4182938 | -071.0012566 | N | POLYGON ((240343.022 905113.584, 240204.477 90... | 2151 |
161 rows × 13 columns
Perfect. Time to join this to the ZCTAs! Then we can rasterize this thing.
# I'm going to merge the age data with the ZCTA geodataframe.
zips_merge = zip_codes.merge(age_reduce, on='ZCTA', how='left')
zips_merge
| STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | ZCTA | GEO_ID | NAME | S0101_C02_008E | age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | POLYGON ((243830.175 913871.662, 243877.753 91... | 1905 | 8600000US01905 | ZCTA5 01905 | 7.8 | 7.8 |
| 1 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((245713.029 914796.777, 245700.484 91... | 1904 | 8600000US01904 | ZCTA5 01904 | 6.8 | 6.8 |
| 2 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | POLYGON ((250741.171 926889.812, 250787.714 92... | 1915 | 8600000US01915 | ZCTA5 01915 | 5.7 | 5.7 |
| 3 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | N | MULTIPOLYGON (((222090.890 897305.465, 222058.... | 2462 | 8600000US02462 | ZCTA5 02462 | 4.8 | 4.8 |
| 4 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((208151.539 893024.178, 208131.535 89... | 1760 | 8600000US01760 | ZCTA5 01760 | 6.9 | 6.9 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 156 | 25 | 02071 | 2502071 | B5 | G6350 | S | 2513156 | 33452 | +42.1034247 | -071.2735885 | N | MULTIPOLYGON (((217320.943 872620.799, 217310.... | 2071 | 8600000US02071 | ZCTA5 02071 | 15.6 | 15.6 |
| 157 | 25 | 02343 | 2502343 | B5 | G6350 | S | 18146751 | 280250 | +42.1489129 | -071.0037371 | N | POLYGON ((243087.621 878272.261, 243078.760 87... | 2343 | 8600000US02343 | ZCTA5 02343 | 5.4 | 5.4 |
| 158 | 25 | 02129 | 2502129 | B5 | G6350 | S | 3492181 | 1266666 | +42.3796570 | -071.0614875 | N | POLYGON ((235383.264 902518.035, 235381.204 90... | 2129 | 8600000US02129 | ZCTA5 02129 | 13.8 | 13.8 |
| 159 | 25 | 02128 | 2502128 | B5 | G6350 | S | 12561059 | 2952872 | +42.3611289 | -071.0069754 | N | POLYGON ((237939.099 904138.351, 237949.737 90... | 2128 | 8600000US02128 | ZCTA5 02128 | 12.8 | 12.8 |
| 160 | 25 | 02151 | 2502151 | B5 | G6350 | S | 15190546 | 3291332 | +42.4182938 | -071.0012566 | N | POLYGON ((240343.022 905113.584, 240204.477 90... | 2151 | 8600000US02151 | ZCTA5 02151 | 9.2 | 9.2 |
161 rows × 17 columns
zips_merge.plot(column = 'age', cmap="RdYlBu", figsize=(8,8), legend=True)
plt.title("Percent of Residents Age 30-34, Boston Region ZCTAs")
plt.show()
Dang. Very, very low percentages of people in their early 30's! Now it's time to rasterize it and see something really cool.
shapes = ((geom, value) for geom, value in zip(zips_merge.geometry, zips_merge['age']))
age_raster = features.rasterize(shapes,
out_shape=shape, fill=1, transform=transform, default_value=0)
age_raster
array([[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]], dtype=float32)
We have an age raster! Let's reclassify this to keep it in line with all the other variables. Based on the map above, I'm dealing with pretty low percentages of early 30's residents, so I'll adjust my expectations slightly.
| Min Percentage 30-34 (Inclusive) | Max Percentage 30-34 (Exclusive) | Suitability Level | Suitability Score |
|---|---|---|---|
| 10% | ... | Very high suitability | 5 |
| 7% | 10% | High suitability | 4 |
| 4% | 7% | Medium suitability | 3 |
| 1% | 4% | Low suitability | 2 |
| ... | 1% | Very low suitability | 1 |
# Create a new raster, reclassify based on the rankings above.
age_reclass = np.full(shape, np.NaN)
age_reclass[age_raster >= 10] = 5
age_reclass[(age_raster >= 7) & (age_raster < 10)] = 4
age_reclass[(age_raster >= 4) & (age_raster < 7)] = 3
age_reclass[(age_raster >= 1) & (age_raster < 4)] = 2
age_reclass[age_raster < 1] = 1
# As ever, I want to take out the parts that aren't in the Boston region.
age_reclass_bos = age_reclass * mask
# Let's see what it looks like!
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(age_reclass_bos, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Reclassified Age Raster, Boston Region", fontsize=20)
plt.show()
Definitely the least pleasant of all the maps above - but still, there are a few dark blue ZCTAs!
Well, we've made it this far looking at one variable at a time. But what do they all look like together? Let's put these scores together and see what happens!
living_unweighted = impervious_reclass + landcover_reclass + trees_reclass \
+ farmers_reclass_bos + libraries_reclass_bos + rest_reclass_bos + age_reclass_bos
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(living_unweighted, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Suitability for Kelsey's Living (Unweighted), Boston Region", fontsize=20)
plt.show()
Looks like all of those market, restaurant, and library dots were worth a lot! This is pretty neat - but how about weighted? I'll use the weights below. As you can see, my stomach drives all of my decisions.
| Component | Weight |
|---|---|
| Impervious surface | 10% |
| Land cover | 10% |
| Tree cover | 10% |
| Distance to farmers markets | 20% |
| Distance to libraries | 15% |
| Distance to restaurants | 20% |
| Percent of residents in early 30's | 15% |
living_weighted = (impervious_reclass*.1) + (landcover_reclass*.1) \
+ (trees_reclass*.1) + (farmers_reclass_bos*.20) \
+ (libraries_reclass_bos*.15) + (rest_reclass_bos*.20) + (age_reclass_bos * .15)
fig, ax = plt.subplots(figsize=(8,8))
image = ax.imshow(living_weighted, cmap="RdYlBu",
extent = (bounds.left, bounds.right, bounds.bottom, bounds.top))
fig.colorbar(image, cax = fig.add_axes([.95, 0.48, 0.03, 0.38]))
fig.suptitle("Suitability for Kelsey's Living (Weighted), Boston Region", fontsize=20)
plt.show()
Yep, that city area just keeps looking better and better. I can always ride my bike to the Fells.
But what if we look at zonal stats and see exactly which ZCTA is the most suitable?
I need to start by resetting the index on my ZCTA gdf, to make sure the mean values can map to it.
zips_reset = zips_clipped.reset_index()
zips_reset
| index | STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17 | 25 | 01905 | 2501905 | B5 | G6350 | S | 9219345 | 1195154 | +42.4659985 | -070.9757922 | N | POLYGON ((243830.175 913871.662, 243877.753 91... |
| 1 | 18 | 25 | 01904 | 2501904 | B5 | G6350 | S | 11708211 | 1303900 | +42.4924563 | -070.9739297 | N | POLYGON ((245713.029 914796.777, 245700.484 91... |
| 2 | 21 | 25 | 01915 | 2501915 | B5 | G6350 | S | 39091336 | 3958118 | +42.5702688 | -070.8669962 | N | POLYGON ((250741.171 926889.812, 250787.714 92... |
| 3 | 35 | 25 | 02462 | 2502462 | B5 | G6350 | S | 1369318 | 69749 | +42.3287076 | -071.2559002 | N | MULTIPOLYGON (((222090.890 897305.465, 222058.... |
| 4 | 37 | 25 | 01760 | 2501760 | B5 | G6350 | S | 38359306 | 2599042 | +42.2848223 | -071.3488109 | N | POLYGON ((208151.539 893024.178, 208131.535 89... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 156 | 523 | 25 | 02071 | 2502071 | B5 | G6350 | S | 2513156 | 33452 | +42.1034247 | -071.2735885 | N | MULTIPOLYGON (((217320.943 872620.799, 217310.... |
| 157 | 524 | 25 | 02343 | 2502343 | B5 | G6350 | S | 18146751 | 280250 | +42.1489129 | -071.0037371 | N | POLYGON ((243087.621 878272.261, 243078.760 87... |
| 158 | 526 | 25 | 02129 | 2502129 | B5 | G6350 | S | 3492181 | 1266666 | +42.3796570 | -071.0614875 | N | POLYGON ((235383.264 902518.035, 235381.204 90... |
| 159 | 527 | 25 | 02128 | 2502128 | B5 | G6350 | S | 12561059 | 2952872 | +42.3611289 | -071.0069754 | N | POLYGON ((237939.099 904138.351, 237949.737 90... |
| 160 | 529 | 25 | 02151 | 2502151 | B5 | G6350 | S | 15190546 | 3291332 | +42.4182938 | -071.0012566 | N | POLYGON ((240343.022 905113.584, 240204.477 90... |
161 rows × 13 columns
Looks good. Now let's create those zonal stats for the unweighted version and map it!
unweighted_stats = pd.DataFrame(zonal_stats(zips_clipped, living_unweighted,
affine=transform, stats=['mean'], nodata = np.NaN))
zips_unweighted = zips_reset.join(unweighted_stats)
zips_unweighted.plot(column = 'mean', cmap="RdYlBu", figsize=(8,8), legend=True)
plt.title("Unweighted Suitability Rankings for Kelsey, Boston Region ZCTAs")
plt.show()
In this unweighted universe, what's the worst ranked ZCTAs for me to live in?
sort_unweighted = zips_unweighted.sort_values(by='mean', ascending=True)
sort_unweighted.head(1)
| index | STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 72 | 244 | 25 | 02047 | 2502047 | B5 | G6350 | S | 619346 | 21837 | +42.1337338 | -070.6858599 | N | POLYGON ((267358.196 875819.198, 267395.076 87... | 14.446043 |
02047 it is! Don't think I'll go hang out there. What about the best?
sort_unweighted = zips_unweighted.sort_values(by='mean', ascending=False)
sort_unweighted.head(1)
| index | STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 52 | 183 | 25 | 02130 | 2502130 | B5 | G6350 | S | 8635553 | 328448 | +42.3091743 | -071.1138354 | N | MULTIPOLYGON (((230455.412 894510.822, 230474.... | 28.233906 |
02130 is in first place with a mean suitability score of 23.54. Pretty darn cool. That's Jamaica Plain!
But what about the weighted version?
weighted_stats = pd.DataFrame(zonal_stats(zips_clipped, living_weighted,
affine=transform, stats=['mean'], nodata = np.NaN))
zips_weighted = zips_reset.join(weighted_stats)
zips_weighted.plot(column = 'mean', cmap="RdYlBu", figsize=(8,8), legend=True)
plt.title("Weighted Suitability Rankings for Kelsey, Boston Region ZCTAs")
plt.show()
Yep, those internal zip codes are calling to me. Out of all that blue, what's the best ZCTA for me?
sort_weighted = zips_weighted.sort_values(by='mean', ascending=False)
sort_weighted.head(1)
| index | STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 51 | 182 | 25 | 02113 | 2502113 | B5 | G6350 | S | 260829 | 0 | +42.3653348 | -071.0552357 | N | POLYGON ((236709.574 901967.559, 236727.610 90... | 4.298625 |
02113! That's quite a coup. What about the worst? I see that dark red spot out in the northwest...
sort_weighted = zips_weighted.sort_values(by='mean', ascending=True)
sort_weighted.head(1)
| index | STATEFP10 | ZCTA5CE10 | GEOID10 | CLASSFP10 | MTFCC10 | FUNCSTAT10 | ALAND10 | AWATER10 | INTPTLAT10 | INTPTLON10 | PARTFLG10 | geometry | mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 72 | 244 | 25 | 02047 | 2502047 | B5 | G6350 | S | 619346 | 21837 | +42.1337338 | -070.6858599 | N | POLYGON ((267358.196 875819.198, 267395.076 87... | 1.926739 |
Interesting - weighting changed nothing! It's 02047 again.
It turns out that the most suitable place for me to live in the Boston region is the North End, and the worst is Humarock. I had never even heard of it - apparently only 180 people live there. The North End it is!
m = folium.Map(location=[42.3317600, -71.1211600], zoom_start=11)
folium.Marker(
[42.365, -71.055], popup="<i>North End, Boston, MA</i>",
tooltip="The Winner: North End, Boston",
icon = folium.Icon(color='blue'),
).add_to(m)
folium.Marker(
[42.142836, -70.69353], popup="<i>Humarock, MA</i>",
tooltip="The Least Suitable: Humarock",
icon = folium.Icon(color='red'),
).add_to(m)
m